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Abstract. In many areas of imaging science, it is difficult to measure the phase of linear measurements. As such, one often 
wishes to reconstruct a signal from intensity measurements, that is, perform phase retrieval. In this paper, we provide a novel 
measurement design which is inspired by intcrfcromctry and exploits certain properties of expander graphs. We also give an 
efficient phase retrieval procedure, and use recent results in spectral graph theory to produce a stable performance guarantee 
that rivals the state of the art. 
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1. Introduction. Given a collection of vectors $ Q C M and a signal x G C A/ , consider 

O ■ measurements of the form 



Zf.= \{x,ipt)\ +U£, (1.1) 

where vi is noise; we call these noisy intensity measurements. Several areas of imaging science, such as X-ray 
crystallography [30] HH1 HZ], diffraction imaging jS], astronomy [T3J and optics [33], use measurements of 
this form with the intent of reconstructing the original signal; this inverse problem is called phase retrieval. 
Note that in the measurement process (jl.ip . we inherently lose some information about x. Indeed, for every 
uj G C with | a; | = 1, we see that x and lox produce the same intensity measurements. Thus, the best one 
can hope to do with the intensity measurements of x G C M is reconstruct the class [x] G C M /^, where ~ is 
the equivalence relation of being identical up to a global phase factor. 



In practice, phase retrieval falls short of determining the original signal up to global phase. First of all, the 
intensity measurement process that is used, say A : C A/ / ~ — s- M> with A(x) = |$*a;| 2 (entrywise) and 



> 

■ viewing $ = [tpi ■ ■ ■ <pn], often lacks injectivity, making it impossible to reconstruct uniquely Moreover, the 
t — 1 phase retrieval algorithms that are used in practice take alternating projections onto the column space of $* 
(to bring phase to the measurements) and onto the nonconvex set of vectors y whose entry magnitudes match 
the intensity measurements |$*a;| 2 (to maintain fidelity in the magnitudes) [T51 UH1 US] ■ The convergence 
O^l ■ of these algorithms is particularly sensitive to the choice of initial phases, and so even if the measurement 
design were injective, the phase retrieval procedure needs improvement. 



These deficiencies have prompted two important lines of research in phase retrieval: 



(i) For which measurement designs $ is [x] i-» |$*x| 2 injective? 



? 



(ii) Given injectivity, how can [x] be reconstructed stably and efficiently 

A first step toward solving (i) is determining how large N must be in order for A to be injective. It remains 
an open problem to find the smallest such N, but embedding results in differential geometry give that 
N > (4 + o(l))M is necessary (4J [22] . As for sufficiency, Balan, Casazza and Edidin [6] show that for almost 
every choice of $, A is injective whenever N > AM — 2. However, Balan ct al. do not suggest a signal 
reconstruction process in such cases. Even if a general process existed for the injective case, there is no 
guarantee it would be stable. Moreover, some instances of the phase retrieval problem are known to be 
NP-complete [30], and so any general reconstruction process is necessarily inefficient, assuming P ^ NP. 
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This leads one to attempt stable and efficient reconstruction with particular ensembles $. Until recently, 
this was only known to be possible in cases where N = fl(M 2 ) [5]. By contrast, the state of the art comes 
from Candes, Strohmer and Voroninski [lOj . who use semidefinite programming to stably reconstruct from 
N = O(MlogM) Gaussian-random measurements. There is other work along this vein [14j [33] which 
also uses semidefinite programming and provides related guarantees. In theory, semidefinite programs are 
solved via interior point methods, whose iterations are often rather expensive. In practice, one is inclined to 
use faster numerical methods, but these lack performance guarantees. 

In this paper, we propose an exchange of sorts: If you already have 0(M log M) Gaussian-random measure- 
ments, then we offer a faster reconstruction method with a stable performance guarantee, but at the price of 
0(M logM) additional (non-adaptive) measurements. These new measurements arc intcrfcrometry-inspircd 
combinations of the originals, and the speedups gained in reconstruction come from our use of different 
spectral methods. To help motivate our measurement design and phase retrieval procedure, we start in the 
next section by considering the simpler, noiseless case. In this case, the success of our method follows from 
a neat trick involving the polarization identity along with some well-known results in the theory of expander 
graphs. In Section 3, we modify the method to obtain provable stability in the noisy case; here, we exploit 
some recent developments in spectral graph theory. We give concluding remarks in Section 4, and provide 
the more technical proofs in the appendix. 

2. The noiseless case. In this section, we provide a new technique for phase retrieval. Here, we 
specifically address the noiseless case, in which vg in (jl.lj) is zero for every I = 1, . . . , N; this case will give 
some intuition for a more stable version of our techniques, which we introduce in the next section. In the 
noiseless case, we will use on the order of the fewest measurements possible, namely N = O(M), where M 
is the dimension of the signal. 

Before stating our measurement design and phase retrieval procedure, we motivate both with some discussion. 
Take a finite set V, and suppose we take intensity measurements of x £ C A/ with a spanning set <E>y '■= 
{fi}i£V Q C M . Again, wc wish to recover x up to a global phase factor. Having \ (x,ipi)\ for every i £ V, 
we claim it suffices to determine the relative phase between (x, (pi) and (x, (fj) for all pairs i ^ j. Indeed, if 
we had this information, we could arbitrarily assign some nonzero coefficient Cj = \{x, tpi}\ to have positive 
phase. If (x,ifj) is also nonzero, then it has well-defined relative phase 



which determines the coefficient by multiplication: Cj = Uij\(x,ipj}\. Otherwise when (x,tpj) = 0, we 
naturally take Cj = 0, and for notational convenience, we arbitrarily take u>ij = 1. From here, the original 
signal's equivalence class [x] £ C M / ~ can be identified by applying the canonical dual frame {(f>j}j^v-, 
namely the Moore-Penrose pseudoinverse, of $>y: 



Having established the utility of the relative phase between coefficients, we now seek some method of ex- 
tracting this information. To this end, we turn to a special version of the polarization identity: 

Lemma 2.1 (Mercedes-Benz Polarization Identity). Take uj := c 27 "/ 3 . Then for any a, 6 £ C. 




(2.1) 



(2.2) 





(2.3) 



Proof. We start by expanding the right-hand side of (|2.3|) : 




Multiplying, we find 



Re(oj- k ab) = Re(w _fc )Re(a&) - lm(u}~ k )lm(ab) = Re(w fe )Re(a&) + Im(w fc )Im(a6). 
We substitute this into our expression for RHS: 



Rc(RHS) 



Im(RHS) = \ 



2 r 



Re(a6) ^ (Re{uj k )) + Im(a6) J2 Re(u; fc )Im(u;* 

k=0 k=0 
2 2 

Re(a6) ^ Re(u> k )lm(u} k ) + Im(a&) ^ (Im(w fe )) 



k=0 k=0 

Finally, we apply the following easy-to-verify identities: 

2 2 



£ (Re( W *)) a = £ (Im(c/)) 2 = | £Rc(^)Im( W fe ) = 0, 



k=0 k=0 k=0 

which yield RHS = 3&. □ 

The above polarization identity can also be proved by viewing {w fe }| =0 as a Mercedes-Benz frame in R 2 
and | X)fe=o k>*Re(a; _fe M) as the corresponding reconstruction formula for u € C = R 2 . We can now use this 
polarization identity to determine relative phase (|2.1[) : 

x 2 l 2 

(x,<pi)(x,(pj) = -Y,u k \(x,ipi) +uj- k (x 1 tp j )\ 2 = -Y,u k \(x,ip, +uj k L Pj )\ 2 . (2.4) 

fe=0 fc=0 



Thus, if in addition to $y we measure with {ipt +w fc (^j} 2 . =0 , we can use (|2.4[) to determine (x,<pi)(x,(pj) 
and then normalize to get the relative phase: 



provided both (x, (fi) and (x, ifj) are nonzero. This idea of extracting information from interfering intensity 
measurements is not new; for example, uses intensity measurements of multiple unknown signals and 
their interferences to recover them individually. To summarize our discussion of reconstructing a single 
signal, if we measure with $y an d {fi + uVjj^n for every pair i,j 6 V, then we can recover [x]. However, 
such a method uses |V| + 3('^') measurements, and since <J>y must span C M , we necessarily have |V| > M 
and thus a total of il(M 2 ) measurements. 

In pursuit of O(M) measurements, take some simple graph G = (V, E), arbitrarily assign a direction to each 
edge, and only take measurements with &y and := (J^ fieE^-Pi + w 'Vj}fc = o- ^° recover M> we again 
arbitrarily assign some nonzero vertex measurement to have positive phase, and then we propagate relative 
phase information along the edges by multiplication to determine the phase of the other vertex measurements 
relative to the original vertex measurement: 

u ik = LOijUJjk- (2.6) 

However, if x is orthogonal to a given vertex vector, then that measurement is zero, and so relative phase 
information cannot propagate through the corresponding vertex; indeed, such orthogonality has the effect 
of removing the vertex from the graph, and for some graphs, this will prevent recovery. For example, if G is 
a star, then x could be orthogonal to the vector corresponding to the internal vertex, whose removal would 
render the remaining graph edgeless. That said, we should select $y and G so as to minimize the impact of 
orthogonality with vertex vectors. 

First, we can take <£>y to be full spark, that is, $y has the property that every subcollection of M vectors 
spans. Full spark frames appear in a wide variety of applications. Explicit deterministic constructions of 
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them are given in [IJ [28]. For example, we can select the first M rows of the |V| x \V\ discrete Fourier 
transform matrix, and take <&v to be the columns of the resulting M x \V\ matrix; in this case, the fact that 
<&v is full spark follows from the Vandermondc determinant formula. In our application, <5>v being full spark 
will be useful for two reasons. First, this implies that x ^ is orthogonal to at most M — 1 members of 
$y, thereby limiting the extent of x's damage to our graph. Additionally, $y being full spark frees us from 
requiring the graph to be connected after the removal of vertices; indeed, any remaining component of size 
M or more will correspond to a subcollection of $y that spans, meaning it has a dual frame to reconstruct 
with. It remains to find a graph of O(M) vertices and edges that maintains a size-M component after the 
removal of any M — 1 vertices. 

To this end, we consider a well-studied family of sparse graphs known as expander graphs. We choose these 
graphs for their notably strong connectivity properties. There is a combinatorial definition of expander 
graphs, but we will focus on the spectral definition. Given a d-rcgular graph G of n vertices, consider its 
adjacency matrix A, and define the Laplacian to be L := I — ^A; if G were not regular, we would consider 
the diagonal matrix D of vertex degrees and define the Laplacian to be L := I — D~ x l 2 AD~ X / 2 . This is 
often called the normalized Laplacian in the literature, but we make no distinction here. We are particularly 
interested in the eigenvalues of the Laplacian: = Ai < ■ • • < A„. The second eigenvalue A2 of the Laplacian 
is called the spectral gap of the graph, and as we shall see, this value is particularly useful in evaluating the 
graph's connectivity. We say G has expansion A if {A2, . . . , A„} C [1 — A, 1 + A]; note that since 1 — A < A2, 
small expansion implies large spectral gap. Furthermore, a family of d-regular graphs {Gi}°^ 1 is a spectral 
expander family if there exists c < 1 such that every Gi has expansion X(Gi) < c. Since d is constant over an 
expander family, expanders with many vertices have particularly few edges. There are many results which 
describe the connectivity of expanders, but the following is particularly relevant to our application: 

Lemma 2.2 (Spectral gap grants connectivity |21j). Consider a d-regular graph G of n vertices with spectral 
gap A2. For all e < removing any edn edges from G results in a connected component of size > (1— j^)n. 

Note that removing en vertices from a d-regular graph necessarily removes < edn edges, and so this lemma 
directly applies. For our application, we want to guarantee that the removal of any M — 1 vertices maintains 
a size-M component. To do this, we will ensure both (i) M — 1 < en and (ii) M — 1 < (1 — j^)n, and then 
invoke the above lemma. Note that since n > M > 2, 

A 2 Tr[L] n 12 2e 

£ -~6~ 6(n- 1) ~ 6(n- 1) ~ 3 K 3 ~ 1 ~ A? 

where the last inequality is a rearrangement of e < ^ . Thus en < ( 1 — ^ )n, meaning (i) implies (ii) , and so it 
suffices to have M < en+1. Overall, we use the following criteria to pick our expander graph: Given the signal 
dimension M, use a (i-regular graph G = (V, E) of n vertices with spectral gap A2 such that M < (^p-)n + 1. 
Then by the previous discussion, the total number of measurements is N — \V\ + 3\E\ = (|rf + l)n. 

Recall that we seek N = O(M) measurements. To minimize the redundancy jj for a fixed degree d, we 
would like a maximal spectral gap A2, and it suffices to seek minimal spectral expansion A. Spectral graph 
families known as Ramanujan graphs are asymptotically optimal in this sense; taking to be the set of 
connected c?-regular graphs with > n vertices, Alon and Boppana (see [2]) showed that for any fixed d, 

lim inf A(G) > *[*E± 

n-><x> gggz d 

while Ramanujan graphs are defined to have spectral expansion < 2 ^ —1 . To date, Ramanujan graphs have 
only been constructed for certain values of d. One important construction was given by Lubotzky, Phillips, 
and Sarnak |25| . which produces a Ramanujan family whenever d — 1 = 1 mod 4 is prime. Among these 
graphs, we get the smallest redundancy -p- when M = [(1 — 2 ^~ X )f + lj an d d = 6: 

— < — ^ j=L — = 45(3 + y/E) w 235.62. 

\ L d I 6 



Thus, in such cases, our techniques allow for phase retrieval with only N < 236M measurements. However, 
the number of vertices in each Ramanujan graph from |25j is of the form q(q 2 — 1) or q ^ q , where 
q = 1 mod 4 is prime, and so any bound on redundancy using these graphs will only be valid for particular 
values of M. 

In order to get N = 0(M) in general, we use the fact that random graphs are nearly Ramanujan with high 
probability. In particular, for every e > and even d, a random d-rcgular graph has spectral expansion 
A < 2 ^ d - 1+e with high probability asmoo [17]. Thus, picking e and d to satisfy 2 ^ /d ~ 1 + £ < 1, we may 
take M = [{I - 27 V +£ )f + lj to get 

N (§d+l)n 
— < 2 ; , 

M ~ (1 _ 2 % /5=T+E N» 

d I 6 

and this choice will satisfy M < (^f-)w + 1 with high probability. To see how small this redundancy is, note 
that taking e = 0.1 and d = 8 gives N < 240M. While the desired expansion properties of a random graph 
are only present with high probability, estimating the spectral gap is inexpensive, and so it is computationally 
feasible to verify whether a randomly drawn graph is good enough. Moreover, n can be any sufficiently large 
integer, and so the above bound is valid for all sufficiently large M, i.e., our procedure can perform phase 
retrieval with TV = O(M) measurements in general. 

Combining this with the above discussion, we have the following measurement design and phase retrieval 
procedure: 

Measurement design (noiseless case) 

• Fix d even and e > 0. 

• Given M, pick some d-regular graph G = (V, E) with spectral gap A2 > A' := 1 — 2 ^ d ~ 1 + e anc l 
|V| = \jt{M — l)], and arbitrarily direct the edges. 

• Design the measurements $ := $y U $e by taking $y := {<^j}iev C C M to be full spark and 
$b :=U(i,i) e j3{^+^ fc Vi}Lo- 

Phase retrieval procedure (noiseless case) 

• Given {| (x, <p) | 2 } y6 <[>, delete the vertices i € V with <fi)\ 2 = 0. 

• In the remaining induced subgraph, find a connected component of > M vertices V . 

• Pick a vertex in V to have positive phase and propagate/multiply relative phases (|2.6p . which are 
calculated by normalizing (|2.4p . see (12.51) . 

• Having {(x, <Pt)}ieV up to a global phase factor, find the least-squares estimate of [x] by applying 
the Moore-Penrose pseudoinverse of {(fi}ieV', see (|2.2|) . 

Note that this phase retrieval procedure is particularly fast. Indeed, if we use E C V 2 to store G, then we can 
delete vertices i £ V with |(x, ifi)\ 2 = by deleting the edges for which (|2.4[) is zero, which takes 0(|i?|) time. 
Next, if the members of E are ordered lexicographically, the remaining subgraph can be easily partitioned 
into connected components in Odi^l) time by collecting edges with common vertices, and then propagating 
relative phase in the largest component is performed in C(|i?|) time using a depth- or breadth-first search. 
Overall, we only use 0{M) time before the final least-squares step of the phase retrieval procedure, which 
happens to be the bottleneck, depending on the subcollection In general, we can find the least-squares 
estimate in 0(M 3 ) time using Gaussian elimination, but if 4>y has special structure (e.g., it is a submatrix 
of the discrete Fourier transform matrix), then one might exploit that structure to gain speedups (e.g., use 
the fast Fourier transform in conjunction with an iterative method). Regardless, our procedure reduces the 
nonlinear phase retrieval problem to the much simpler problem of solving an overdetermined linear system. 

While this measurement design and phase retrieval procedure is particularly efficient, it certainly lacks 
stability. Perhaps most notably, we have not imposed anything on $y that guarantees stability with invert- 
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ing 3>v'] indeed, we have merely enforced linear independence between vectors, while stability will require 
well-conditioning. Another noteworthy source of instability is our method of phase propagation, which nat- 
urally accumulates error; it would be better if the relative phases were combined using a more democratic 
process that encourages noise cancellation. In the next section, we will address these concerns (and others) 
and modify our procedure accordingly; the revised procedure will be stable, but at the price of a log factor 
in the number of measurements: N = 0(M log M). As we mention in the concluding remarks, we do not 
think this log factor is necessary, but we leave this pursuit for future work. 

3. The noisy case. In this section, we consider a noise-robust version of the measurement design and 
phase retrieval procedure of the previous section. In the end, the measurement design will be nearly identical: 
vertex measurements will be independent complex Gaussian vectors (thereby being full spark with probability 
1), and the edge measurements will be the same sort of linear combinations of vertex measurements. Our use 
of randomness in this version will enable the vertex measurements to simultaneously satisfy two important 
conditions with high probability: projective uniformity with noise and numerical erasure robustness. Before 
defining these conditions, we motivate them by considering a noisy version of our phase retrieval procedure. 

Recall that our noiseless procedure starts by removing the vertices i £ V for which \(x, <Pi)\ 2 = 0. Indeed, 
since we plan to propagate relative phase information along edges, these 0-vertices are of no use, as relative 
phase with these vertices is not well defined. Since we calculate relative phase by normalizing (|2.4j) . we see 
that relative phase is sensitive to perturbations when (|2.4|) is small, meaning either (x, (pi) or (x, (pj) is small. 
As such, while 0-vertices provide no relative phase information in the noiseless case, small vertices provide 
unreliable information in the noisy case, and so we wish to remove them accordingly. However, we also want 
to ensure that there are only a few small vertices. In the noiseless case, we limit the number of 0-vertices by 
using a full spark frame; in the noisy case, we make use of a new concept we call projective uniformity: 

Definition 3.1. The a-projective uniformity o/$ = {(pi}2=i Q C M is given by 

P\J(^;a) — min max min I (x, (pi) I . 
xeC M xc{i,...,n} 

— 1 |X|>an 



In words, projective uniformity gives the following guarantee: For every unit-norm signal x, there exists 
a collection of vertices I C V of size at least a\V\ such that \(x,<pi}\ 2 > PU($y;a;) for every i € X. As 
such, projective uniformity effectively limits the total number of small vertices possible, at least before the 
measurements are corrupted by noise. However, the phase retrieval algorithm will only have access to noisy 
versions of the measurements, and so we must account for this subtlety in our procedure. In an effort to 
isolate the reliable pieces of relative phase information, we will remove the vertices corresponding to small 
noisy edge combinations (|2.4j) : 

Algorithm 3.2 (Pruning for reliability). Given a graph G = {V,E) and a function f : E — > M such that 
f(hj) = \{ x i fi){ x i Pj) + initialize H <— G. Do the following procedure — Qr)| V|J times: Find the 
minimizer (i,j) G E of f , and update H <— H \ When done, output H. 

We now explain why only reliable pieces of relative phase information will remain after running the above 
algorithm, provided $y has sufficient projective uniformity. The main idea is captured in the following: 

Lemma 3.3. Define \\8\\t := minfcgz \8 — 2irk\ for all angles 8 £ IR/27rZ. Then for any z,e G C, 

|e| 

II arg(z + e) - arg(z)|| T < tt— . 



Proof. If |e| > \z\, the result is trivial. Suppose \e\ < \z\, and consider the triangle whose vertices are 0, z, 
and z + e. Denoting the angle at by 8 = || arg(z + e) — arg(z)||T and the angle at z + £ by a, the law of 
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sines gives 



sin 



6 



sm a 




(3.1) 



e 



z 



Next, the law of cosines gives 

2\z\\z + e\ cos9 = \z\ 2 + \z + e\ 2 - \e\ 2 > \z + e\ 2 > 0, 
and so cosd > 0, i.e., 9 £ [0, ^). Finally, by the concavity of sin(-) and then (|3.1[) . we conclude that 



By taking z = (x,(pi)(x,ipj) + Sij and e = — £y, we can use this lemma to bound the relative phase error 
we incur when normalizing z. In fact, consider the minimum of / when Algorithm 13.21 is complete. Since 
the algorithm deletes vertices from G according to the input signal x, this minimum will vary with x; let 
PUN denote the smallest possible minimum value. Then the relative phase error incurred with (i, j) £ E is 
no more than 7r|£jj|/PUN, regardless of the signal measured. Indeed, our use of •projective uniformity with 
noise (i.e., PUN) is intended to bound the instability that comes with normalizing small values of (|2.4p . As 
we will show in the appendix, PUN can be bounded below by using the projective uniformity of and 
furthermore, a complex Gaussian $y has projective uniformity with overwhelming probability. 

After applying Algorithm 13.21 our graph will have slightly fewer vertices, but the remaining edges will 
correspond to reliable pieces of relative phase information. Recall that we plan to use this information on 
the edges to determine phases for the vertices, and we want to do this in a stable way. To understand 
when this is even possible, we first consider a few simple scenarios. Suppose that after removing vertices 
with Algorithm 13.21 the graph has a vertex of degree 0. Then we have no information about the phase 
of this vertex, and it should be removed accordingly. For a less extreme scenario, suppose the vertex has 
degree 1. Then any noise in the corresponding edge measurement would be passed directly to the vertex, 
which inherently lacks stability compared to the noise cancellation that would come with more edges. More 
generally, if the graph has a cut vertex (e.g., the neighbor of a degree-1 vertex), then we would need to rely 
on the correctness of this lone vertex to ensure consistency between the parts of the graph it connects — this 
scenario is also rather unstable. After considering these examples, it makes intuitive sense that stability 
necessitates a high level of connectivity in the graph, regardless of the algorithm used to extrapolate the 
vertex phases. 

As such, we seek to remove a small proportion of vertices so that the remaining graph is very connected, 
i.e., has large spectral gap. To do this, we will iteratively remove sets of vertices that are poorly connected 
to the rest of the graph. These sets will be identified using spectral clustering, a process which is strongly 
motivated by an inequality in Riemannian geometry by Cheeger 1 1 1 j and which has performance guarantees 
originating with Alon [2j [3] . The main idea of spectral clustering follows the intuition that a random walk on 
a graph tends to be trapped in sections of the graph which have few connections to the rest of the vertices. 
Moreover, the second eigenvector of the corresponding stochastic matrix tends to identify these sections. 

Algorithm 3.4 (Spectral clustering). Given a graph, compute the eigenvector u corresponding to the second 
eigenvalue of its Laplacian. Let Si denote the vertices corresponding to the i smallest entries of D~ x / 2 u. 



-0<sin0< |4, 



which implies the result. 



□ 



Minimize 




where E(S,T) denotes the number of edges between S and T and vol(S') = X^es °-eg(v). Output the mini 
mizer Sk or its complement, whichever is smaller in size. 
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Note that when implementing spectral clustering, the values of E(Si, Sf), vol(S'i) and vol(5^ ) can be cheaply 
updated from their (i — l)st values since Si differs from Si-i in only one vertex. As such, the bottleneck 
in spectral clustering is computing an eigenvector. For our application, we will iteratively apply spectral 
clustering to identify small collections of vertices which are poorly connected to the rest of the graph and 
then remove them to enhance connectivity: 

Algorithm 3.5 (Pruning for connectivity). Given a graph G and a threshold r > 0, initialize H <— G. 
While the spectral gap of H is smaller than r, perform spectral clustering (Algorithm \3.J$ to identify a small 
set of vertices S, and update H <— H \ S . Once the spectral gap of H is at least t, output H . 



In the appendix, we show that for a particular choice of threshold r, Algorithm 13.51 recovers a level of 
connectivity that may have been lost when pruning for reliability in Algorithm 13.21 and it does so by 
removing only a small proportion of the vertices. 

At this point, we have pruned our graph so that the measured relative phases are reliable and the vertex 
phases can be stably reconstructed. Now we seek an efficient method to reconstruct these vertex phases from 
the measured relative phases. Before devising such a method, we first organize the information we have into 
a matrix. Given the graph output G' = (V, E') of Algorithm 13. 5[ we take A\ to be the |V'| x |V'| weighted 
adjacency matrix with normalized noisy versions of (|2.4[) : 



A 1 [iJ] = p^^- (3.2) 

\{x,(pi){x,<pj) +£ij\ 

whenever {i,j} G E' , and otherwise ^4i[i,j] = 0. Unlike the noiseless case, here, we account for both 
directions and whenever {i,j} € E', with the understanding that Sji = ~Eif\ this will simplify 

our analysis since this makes A\ self-adjoint. Considering A\\i, j] is an approximation of the relative phase 
ui^uij, it seems reasonable to extrapolate the vertex phases oj := {uii}i^v' from A\ by minimizing the 
following quantity: 

Y |wj - Ax[iJ]oJi\ 2 = Y (N 2 -2RcZD7Ai[z,j]w J + |A 1 [ 2 ,j]w l | 2 ) =lj*(D-A 1 )u, 

where D is the diagonal matrix of vertex degrees. Dividing by vol(G') = uj*Dlu, which does not vary with 
to, this is equivalent to minimizing 

u*Dui \\D l / 2 u\\ 2 - 1 ' 

To be clear, the right-hand side above is the first eigenvalue of L\ := I — D~ x / 2 A\D~ X I 2 '', which we call 
the connection Laplacian; note that this bears some resemblance to the Laplacian defined in the previous 
section. In minimizing the above quantity, it makes sense to consider the eigenvector u corresponding to 
the smallest eigenvalue of L%, but we require each coordinate of D~ l l 2 u to have unit modulus. Provided 
u has no entries which are zero, we can normalize the entries to form an estimate of ui, and as we show in 
the appendix (using results from [7]), this estimate is stable provided the spectral gap of G' is sufficiently 
large. This spectral method is known in the literature as angular synchronization |31j . and we summarize 
the procedure with the following: 



Algorithm 3.6 (Angular synchronization). Given a graph G' = (V',E') and noisy versions of (|2.4j) 
for every {i,j} € E' , construct a weighted adjacency matrix A\ according to (|3.2p whenever {i,j} G E' , 
and otherwise A\[i,j] = 0. Take D to be the diagonal matrix of vertex degrees and define the connection 
Laplacian L\ := I — D~ x / 2 AiD^ 1 / 2 . Compute the eigenvector corresponding to the smallest eigenvalue of 
L\, and output normalized versions of its coordinates. 

To reiterate, Algorithm 13.61 will produce estimates for the phases of the inner products {(x, c/?i)}i e y< . Also, 
we can take square roots of the vertex measurements {|(a;, <fii}\ 2 + Vi}i^v to estimate tpi)\}i^v'- Then 



we can combine these to estimate {(x, tpi)}iev'- However, note that the largest of theses inner products will 
be most susceptible to noise in the corresponding phase estimate. As such, we remove a small fraction of 
these largest vertices so that the final collection of vertices V" has size k\V\, where V was the original vertex 
set, and k is sufficiently close to 1. 

Now that we have estimated the phases of {(x, ipi}}i e v", we wish to reconstruct x by applying the Moore- 
Penrose pseudoinverse of {y>i}iev"- However, since V" is likely a strict subset of V, it can be difficult in 
general to predict how stable the pseudoinverse will be. Fortunately, a recent theory of numerically erasure- 
robust frames (NERFs) makes this prediction possible: If the members of &y are independent Gaussian 
vectors, then with high probability, every submatrix of columns $y» with k = \V"\/\V\ sufficiently large has 
a stable pseudoinverse [15] . This concludes the phase retrieval procedure, briefly outlined below together 
with the measurement design. 

Measurement design (noisy case) 

• Fix d even and e > 0. 

• Given M, pick some d-regular graph G = (V, E) with spectral gap A2 > A' := 1 — 2 ^ d ~ 1+E and 
|V| = cM log M for c sufficiently large, and arbitrarily direct the edges. 

• Design the measurements $ := $yU$£ by taking $y := {<fi}iev Q C M to have independent 
entries with distribution CAf(0, jfr) and <&e '■= U(i a^eW* + wk( Pj}k=o- 

Phase retrieval procedure (noisy case) 

• Given {\(x, <£i)\ 2 + vt}f = i, prune the graph G, keeping only reliable vertices ( Algorithm 13. 2[) . 

• Prune the remaining induced subgraph for connectivity, producing the vertex set V (Algorithm l3.5l) . 

• Estimate the phases of the vertex measurements using angular synchronization (Algorithm 13.61) . 

• Remove the vertices with the largest measurements, keeping only \V"\ — n\V\. 

• Having estimates for {(x, fi)}iev" U P to a global phase factor, find the least-squares estimate of [x] 
by applying the Moore-Penrose pseudoinverse of {ifi}iev"i see (|2.2|> . 

Having established our measurement design and phase retrieval procedure for the noisy case, we now present 
the following guarantee of stable performance: 

Theorem 3.7. Pick N ~ CM log M with C sufficiently large, and take {<pi}f =1 =$vU$£ defined in the 
measurement design above. Then there exist constants C", K > such that the following guarantee holds for 
all x £ C M with overwhelming probability: Consider measurements of the form 



for some phase 9 £ [0 7 2n). 

The interested reader is directed to the appendix for a proof of this guarantee. Before concluding this section, 
we briefly evaluate the result. Note that the norms of the ip^s tend to be 0(1), and so the noiseless measure- 
ments |(x, ifie}\ 2 tend to be of size C(||x|| 2 /M). Also, in the worst-case scenario, the noise annihilates our mea- 
surements vg = — \(x, fi}\ 2 , rendering the signal x unrecoverable; in this case, \\v\\ = 0(\\x\\ 2 ^ (log M) / M) 
since N = CM log M. In other words, if we allowed the noise-to-signal ratio to scale slightly larger than 
C'/vM (i-e., by a log factor), then it would be impossible to perform phase retrieval in the worst case. As 
such, the above guarantee is optimal in some sense. Furthermore, since y Mj log M NSR = 0(l/yTogM) 
by assumption, the result indicates that our phase retrieval process exhibits more stability as M grows large. 
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zt := \(x,(pi)\ 2 + v t . 

If the noise-to-signal ratio satisfies NSR := j^i < ~^M' then the phase retrieval procedure above produces 
an estimate x from {zg}^ =1 with squared relative error 




4. Concluding remarks. This paper provides a new way to perform phase retrieval, and our main 
result (Theorem 13. 7|) shows that our method is stable. In comparison to the state of the art, namely, the 
work of Candes, Strohmer and Voroninski |10j . our stability results mainly differ in how we choose to scale 
the measurement vectors. Indeed, the measurement vectors of the present paper tend to have norm 0(1), 
whereas the measurement vectors of Candes et al. are all scaled to have norm \f~M ; considering the statement 
of Theorem 13. 71 is riddled with square roots of M, either choice of scaling is arguably natural. 

One might feel that our phase retrieval algorithms arc slightly unsatisfying because we perform hard thresh- 
olds to remove vertices according to how small or large the corresponding measurements are. Alternatively, 
there could very well be a way to more smoothly weight these measurements according to our confidence in 
them, and such weightings are already accounted for in the theory of angular synchronization [7j. However, 
we decided to use hard thresholds because they greatly simplify the analysis of projective uniformity (though 
the analysis is still rather technical). 

While the worst-case analysis we provide here is useful in many applications (and enables a comparison with 
the worst-case stability results of [TO]), stochastic noise is a more appropriate model in other applications. 
We believe that the phase retrieval procedure of this paper will perform substantially better in the average 
case, but we leave this analysis for future work. Also, a notable distinction between our measurement designs 
in the noiseless and noisy cases is the presence of a log factor in the number of measurements used. However, 
we believe this factor is an artifact of our current analysis, and we intend to remove it in the future. 



5. Appendix. 



5.1. Graph pruning. This section proves the following guarantee: 

Theorem 5.1. Take proportions p > q > | , and consider a regular graph G = {V,E) with spectral gap 
A2 > g(p, q) := 1 — 2(g(l — q) — (1 — p)). After Algorithm \3.Si removes at most (1 — p)|V^| vertices from G, 
then setting r = ^(^2 — g(P:<l)) 2 , Alaorithm \3.5\ outputs a subgraph with at least q\V\ vertices. 

To prove this theorem, we will apply a graph version of the Cheeger inequality, which provides a guarantee 
for Algorithm ETH 



Theorem 5.2 (Constructive Cheeger inequality |12j). Consider a graph G = (V,E) with spectral gap A2. 
Then Algorithm \3.4\ outputs a set of vertices S such that h(S) < y / 2Aa- 



Proof. [Proof of Theorem 15. 1| First, Algorithm 13.21 removes a set of vertices, which we denote by Sq. In 
applying Algorithm 13.51 the ith step of the while loop removes another set of vertices Si- We claim this 
while loop will end with | {J i>0 Si\ < (1 — q)\V\. Supposing to the contrary, consider the first k for which 

S := Ui=o has at least (1 — q)\V\ vertices. Then since each iteration of the while loop removes at most 
half of the remaining vertices, we have (1 — q)\V\ < \S\ < (1 — §)|V|. 

To derive a contradiction, we will find incompatible upper and lower bounds on E(S,S°). For the upper 
bound, wc apply the fact that G is <i-regular along with the definition of h(Si) to get 

k k k 

E(S,S C ) = E(S ,S C ) + Y,E(S l ,S c ) < vol(So) + 5?) < d|S | + £ Wvol(^). 

i—1 i—1 i—1 

Next, Theorem 15 . 21 bounds each h(Si) in terms of the spectral gap the remaining graph, which is necessarily 
less than r by the condition of the while loop. Thus, we continue: 

k 

E{S,S C ) < d\S \ + ^V^d\S l \ < d(l - p)\V\ + V^d\S\ < d\V\ ((1 - p) + V2r(l - §)). (5.1) 

i=l 

For the lower bound, we use the expander mixing lemma, which says 



E(S,S C )- ^\S\\S C \ <d(l-\ 2 )^\S\\S~c 
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Since the function x i— > x(l — x) is concave down, we know the minimum subject to 1 — q < x < 1 — | 
is achieved at an endpoint of this interval. Thus, evaluating the function at |5|/|V| gives ISHS !/^ 2 > 
rnin{g(l — q), |(1 — |)} = q(l — q), where the last step follows from the fact that q > |. Applying this and 
lulls' 1 < | V\ 2 /A to the expander mixing lemma then gives 

E(S, S c ) > ^SWS^ - d(l - A 2 )^^| > d\V\q(l -q)- d(l - A 2 )^ = d\V\ - g) - ^). (5.2) 

Finally, we combine (j5.1[) and (|5.2[) to get 

q(l-q) + 

which, as substitution reveals, contradicts our choice for r. □ 



5.2. Angular synchronization. This section proves the following guarantee: 

Theorem 5.3. Consider a graph G = (V, E) with spectral gap A 2 > 0, and define ||#||t := min^gz \6 — 2itk\ 
for all angles 9 € R/27rZ. Given the weighted adjacency matrix A\ (|3.2p . then Alaorithm \3.b\ outputs u £ C' y ' 
with unit-modulus entries such that, for some phase 6 <S R/2ir'Z, 



c\\4 2 

A 2 P 2 



X] II &T S{ u i) _ ar g(( a; > Pi)) ~ || T < 



where P := min.fjjjg^ |(x, <fi)(x, tpj) + £ij| and C is a universal constant. 

In terms of phases, we are interested in reconstructing lu* : V — > R/27rZ such that u* = arg((x, i^i)) for 
every i £ V. For some graph G = (V, E), we are given a weighted adjacency matrix A\ (|3.2|) . which encodes 
edge measurements: 

Pij -=u*j- + e„ 

for every € 2£; here, G R/27rZ is angular noise, and we note that eji = We measure the size 

of e = \ < ,, ! ■,.,.; /.• in terms of its components: \\e\\^ := Y^{i,j}eE ll e ullf ■ 

We will prove Theorem 15 . 31 using a recent Cheeger inequality for the connection Laplacian [7]. In effect, this 
result provides a guarantee for Algorithm 13.61 in terms of a certain objective function. To be precise, given 
w : V -> R/2ttZ, we define 

7](LU)= J2 \ C i{{U >>-^- p ^ - 1| 2 . 

Note that one way to reconstruct uj* is to minimize this quantity. Indeed, is small when the angular 
differences ujj — uji are close to the measured differences /Oy, and in the noiseless case, rj(uj) = precisely 
when uj = uj* , provided the graph G is connected. In terms of this objective function, the following guarantee 
ensures that the output of Algorithm 13.61 is no worse than a constant multiple of optimal: 

Theorem 5.4 (Cheeger inequality for the connection Laplacian [7]). Consider a graph G = {V,E) with 
spectral gap A 2 > 0. Given the weighted adjacency matrix A\ (|3.2[) . which encodes edge measurements 
p : E — > R/27rZ, then Alaorithm \3.6\ outputs u € C'^', which encodes Co : V — > R/2-7rZ such that 

C' 

«(w) < — min (5.3) 

A 2 U):V — >JT 

where C' is a universal constant. 
We start with two lemmas: 

Lemma 5.5. For every a,b € R/27rZ, we have ^ ||a||^. — ||o|| 2 < ||a — b\\^. 
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Proof. By abuse of notation, we identify a and b with their coset representatives in [— tt, tt). Since 

< \{a - 2b) 2 = {a- b) 2 - (±a 2 - b 2 ), 
then rearranging gives the following inequality: 

±\\a\\ 2 -\\b\\ 2 = ^-b 2 <(a-b) 2 = \\a-b\\ 2 , 
where the last equality requires a — b G [— tt, tt]. Otherwise, we note that 

< \{a- 2b± 2tt) 2 + 27r(7r±a) = (a - b ± 2tt) 2 - (±a 2 - 6 2 ), 
and so rearranging gives 

±\\a\\ 2 -\\b\\ 2 = ta 2 -b 2 <(a-b±27r) 2 , 
one of which equals \\a — b\\j depending on whether a — b < —tt or a — b > tt. □ 
Lemma 5.6. For every a, b £ C with \a\ = 1, we have | a, — -p^j | < 2|a — b\. 

Proof. The reverse triangle inequality gives |m — 6| = |1 — = ||o| — < \a — b\, and so the triangle 
inequality gives \a— t|t| < \a — b\ + \b — t|t| < 2\a — b\. □ 

Proof. [Proof of Theorem I5.3j Identifying 6 G R/27rZ with its coset representative in [— tt,it), we have 
||0||t = |#|, and a double-angle formula gives \& e — 1| = 2sin-^. From these, it follows that 

§||0||t< |e ie -l| < ||0||t. (5.4) 



This relationship will allow us to apply Theorem 15.41 To this end, for notational convenience, we define 
7; := arg(iti) — arg((z, (fi)) for every i G V. The right-hand inequality of (|5.4p and Lemma [S~5l together give 

i J2 |e^-^-l| 2 -!| e || 2 < J2 (|ll7i-7<lli-K-|||)< E hi-V-e^. (5.5) 
{i,j}eB {»,j}e£ {«,i}eB 

Denoting u>i := arg(ui) and co* := arg((a;, </?*)), then the definition of gives 

<n{w) = E le 1 ^-^-^ - 1| 2 . 

With this, we continue (|5.5[) by applying the left-hand inequality of (|5 ,4[) : 



i £ |e^-^) - 1| 2 - ||e|| 2 < < ^gjnm^M < (5.6) 



where the second inequality follows from Theorem 15 .41 Furthermore, the right-hand inequality of (|5.4j) gives 

??K)= J- |e- ie «-l| 2 <||e||£ 
{iJ}eE 

and so combining this with (|5.6p gives 

E |e i( ^^ ) -l| 2 <2(^g + l)||e|| 2 . (5.7) 

{ij}G-E 

At this point, take a := vol * G ) Siev deg(i)e 17i , define v entry wise by Vi := e 17i , and set id = v — oil. Then 

= E deg(i)(e i7 * - a) = 0, 
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i.e., D 1 / 2 w is orthogonal to D 1 / 2 !. Also, since (D — A)l = 0, we have 

L{D l ' 2 l) = {I- D^ 1 / 2 AD~ 1 l 2 )D 1 t 2 l = D" 1 /2(£) _ A)D~ 1 / 2 D 1 / 2 1 = 0, 
and so D 1 ^ 2 w is orthogonal to a first eigenvector of L, considering L is positive semidefinitc. Thus, 

(D 1 / 2 w)* L(D 1 / 2 w) y*Ly 



> min = A 

w*Dw y ec |v| y*y 

y±D^ 2 l 



2- 



Rearranging then gives 

\ 2 w*Dw < (D 1/2 w)*L(D 1/2 w) = (v - al)*(D - A)(v - al) = v*(D - A)v. 
Continuing, we apply the definitions of D and A to get 

\ 2 w*Dw < 2\E\ - EE e~^A[i, j}c^ = E ( 2 ~ ~ e i( ^- 7l) ) = E ^ ~ eh '\ 2 - 

ieVjev {i,i}eE {i,i}eE 



Next, we factor e ni from the inside of each term and apply (|5.7|) to get 

A 2 £deg«|e^-a| 2 = A 2U ,*^< £ |e^-^ - 1| 2 < 2(^£ + l)||e|||. (5.8) 
iev {i,j}eE 

From here, we proceed in two cases. First, when a / 0, we may take 9 := arg(a). Then the left-hand 
inequality of (|5.4jl and Lemma 15.61 give 



^-C^Eie^-ll^E^ (5.9) 

iev iev iev iev iev 

where the last inequality uses the fact that deg(i) > 1 for every i <E V, i.e., the graph G has no isolated 
vertex since G is connected, which follows from the fact that A 2 > 0. In the case where a = 0, we may 
arbitrarily take 9 = 0. Then similar analysis yields 



E ii-k -*<4E i° i7s - !i 2 ^ ^\ y \ ^ * 2 E de s« = ^ 2 E dc swi ei ' 

iev iev iev iev 



al 2 , 



where in this case, the second inequality applies the triangle inequality to each term instead of Lemma [ 
Note that both cases produce the same bound on X^ev 117* — ^IIt> which we now conclude by combining 
([575]) and (|53j) : 

E h n$ < % + < £e££ + 1) ■ ^ • 

iev 



The last inequality follows from Lemma 13.31 taking z = (x, <Pi){x, (fj) + and e = — £y. □ 



5.3. Projective uniformity. This section is motivated by Theorem [573] of the previous section, which 
exhibits significant dependence on the size of P. Here, we show how Algorithm 13.21 ensures that P will not 
too small, and our guarantee will be in terms of the following noise-robust version of projective uniformity: 

Definition 5.7 (Projective uniformity with noise). Consider a graph G = (V,E) and M x |V| matrix $, 
and for some proportion a £ (0,1), signal x 6 C M and noise e = {£ij}{ij}eE, let J{ct,x,e) denote the set 
of vertices that remain after applying Alaorithm \3.2[ Then the projective uniformity with noise of <3? is 



PUN($;a,f)= mm mm \{x, <Pi)(x, tfj) + £y|. 

xeC M {i,j}ej{a,x,e) 
||x||=l 
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Theorem 5.8. Consider a graph G = (V, E) with \V\ ~ CMlogM for some constant C, and draw the 
entries of an M X |V| matrix independently from GA/"(0, -p>). Then for each proportion a < 1 — there 
exists a constant C > such that, with overwhelming probability, 



:{||£|| 2 ,VI7PUN($;a,£)} 



JM 

for every noise vector e = {sij}u,j}eE- 

To prove this theorem, we apply the following lemma, which follows from concentration-of-measure arguments 
that we provide later: 

Lemma 5.9. Take n ~ CMlogM for some constant C, and draw the entries of an M x n matrix $ 
independently from CAf(0, jj). Then for each proportion a < 1 — ^ , there exists a constant C > such 
that 

PU(4>;a)>g 

with overwhelming probability. 

Proof. [Proof of Theorem 15. 8| Denote n := |V|. Note that if 



e|| 2 > V^n-iPU($;a+i±s), (5.10) 



then by Lemma l5.9[ we have ||e||2 > -7= with overwhelming probability for some constant C > 0. It 



remains to consider the case where (|5.10l) does not hold. To this end, define 



Ji(a, x, e) := argmax min \(x, ifi)(x, cpj) + £ij\. 

JG{l,...,n} {i,j}£J 
\J\>an 

Recalling the definition of J{a.,x,e) in Definition 15. 71 we claim that J {a, x, e) C Jx(a, x, e). To see this, 
label each edge {i,j} G E with \(x,<pi)(x,ipj)+eij\. Then by definition, V\Ji(a,x,e) neighbors more of the 
smallest edges than any other collection of n — [an] = [(1 — a)n\ vertices. By comparison, Algorithm 13.21 
effectively deletes the smallest edge by deleting both of its incident vertices i and j, one of which must 

be in J\ (a, x, e). The smallest edge in the remaining graph is guaranteed to not touch either i or j, but 
rather some k € J\(a, x, e), provided J\(a, x, e) has not yet been completely removed from the graph. After 
[(1 — a)n\ iterations, then by the pigeonhole principle, all vertices in V\ Ji{a, x, e) will be removed, meaning 
J{ol, x, e) C J\(a,x,e), as claimed. This implies that 



r ■ ^ , I ^' ^ ( X ' + I - r • ^ si ( X ' ^) ( X ' W) + £l 3 I ' 



and so minimizing over all unit vectors x gives 



PUN($;a,e)> min max min \(x, <pi)(x, ipj) + e«|. (5-11) 

xGC M JC{l,..,n} {i,j}ej 
\\x\\=l \J\>an 



Next, define 



J2(a,x, e) := argmax min \(x,tpi)(x,<pj)\. 

JC{l,...,n} {ij}eJ 
\J\>(a+^)n 

Since in this case (|5.10[) does not hold, there are at most ^-^-n edges {i,j} £ E with |ey| > iPU(<f>; a+i^). 
Some of these edges are induced by Ji{ol, x, e); as such, delete at most ^-^-n vertices from ^(a, x, e) which 
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are incident to all of these edges, and denote the remaining vertices by 1/3 (a, x, s). By construction, we have 
\J3(a,x,e)\ > an, and also the edges {i,j} induced by J-$(a,x,e) satisfy \eij\ < iPU($;a+ ^Mp 1 )- These 
facts combined with the triangle inequality then give 



„ ™ ax , r m } n ^ I <Pi) ( x > + e « I > r . min I (x, p,} (x, (fj) + Sij \ 

JC{l,...,n\ {t,j}£j 3 (a,x,e) 



\J\>am 



> 



> min \( Xl(pi )( Xj(pj )\-ip\J($.a+^). (5.12) 

{!j(ej3(",3:,E) 



Continuing, we use the fact that ^(a, x, e) C Jjfttj ^e) to get 



min |(a;,^)(a;,^-)| > min ^)(x, <^)| 



> min max min \(x,ipj)(x,Wj)\ 

INI=1 \J\>{a+^)n 

> PU($;a+ i±«). (5.13) 

Combining ([5TTT]l . (f5~T2]l and (|5~T3)) then gives PUN($; a, e) > iPU($;a + i±S), and so the result follows 
directly from Lemma 15.91 □ 



We conclude this section with the rather technical proof of Lemma 15.91 

Proof. [Proof of Lemma [5.9| We seek a bound on the probability that PU($;a) < jj. To do so, we will 
cover this event with smaller failure events £k which correspond to members of a 5- net. First, define 

G s (v) := {<f£ C M : \(v,ip)\ > 3d and \\ip\\ < 2}. 

Then for each member v of a given <5-net Ms of the unit sphere in C M , define the failure event 

£ v := |lcss than an columns of $ lie in Gg(v)^. 

We claim that {PU(<I>; a) < S 2 } C Uu&y £ v - To see this, take an outcome u> £ (Uue.Af £v) c - By the definition 
of Ms, we have that for every unit vector x £ C M , there exists v £ Ms such that \\v — x\\ < 5. This gives a 
mapping x 1— > v x , and we let X x denote the indices of columns of $ which lie in Gs{v x ). Since uj £ f) ve j\f 
we have \I X \ > an for every x. Moreover, by the triangle inequality, Cauchy-Schwarz inequality, and the 
definition of Gs, every i £ X x satisfies 

\{x,tpi)\ > \{v x ,ipi)\ - \(v x -x,ifi)\ > \(v x ,ipi)\ - \\v x -x\\\\ipi\\ >3S- 2\\v x -x\\ > S. 

Applying this inequality then gives 

PU( ( f>;a) = min max min \(x, fi}\ 2 > min min \(x, ipi)\ 2 > 5 2 . 
x ec M iei xec M it 1 * 

\\x\\=l \X\>an IM| = 1 

Thus to £ {PU($; a) < S 2 } c , thereby proving our claim. Continuing, the union bound gives 

Pr(piJ($;a) < <5 2 ) < ^ Pr(£„) = \Ms\ ■ Pv(S v ), 

veNs 

where the equality follows from the fact that the columns of <£> are independent with rotationally symmetric 
probability distributions. It therefore suffices to bound both \Ms\ and Pr(£„). 

To bound \Ms\, we follow a standard argument, found in the proof of Lemma 5.2 in [35]. Let Ms be a 
maximal <5-packing of points on the unit sphere in C M . Since the packing is maximal, it follows that Ms is 
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a 5-nct. To count these points, map them into R 2M according to /: v (Re v, Im u). Consider the open 
balls of radius | centered at each f(v). Note that the (disjoint) union of these balls is contained in the ball 
of radius 1 + | centered at the origin. Thus, a volume comparison gives 

Ws\ ■ C(I) 2M = Vol( □ I)) < Vol(s(0, 1 + |)) = C(l + f) 2M 

thereby implying |JVi| < (| + 1) 2M . 
To bound Pr(£„), note that 

Pr(£„) = Pr ( £ M^Gsiv)} > (1 - a)n) . (5.14) 



As such, we first consider the success probability of these Bernoulli random variables: 

Pr(WG 5 (z;)) <Pr(|(^. t )| < 3<j) +Pr(||^|| >2). (5.15) 

By the rotational symmetry of the distribution of ipi, we may take v to be the first identity basis element e\ 
without changing the probability. Next, since ipi = X)m=i( a ™ + i&m) e m with the a m 's and 6 m 's independent 
with distribution A^(0, 377)) we have 

Pr^Kw,^)! < 3(5) = Pr(^o? + 6? < 35) < Pr( min{a 2 , b\} < §<5 2 ) < 2Pr(a 2 < §c5 2 ), 
where the last step is by the union bound. Since 01's density function is <•»/ — , it follows that 



Pi(\{v,<pi)\ < 3<j) < 2Pr(|o x | < ^<j) < (5.16) 

For the other term in (|5.15[) . note that 2M||</?i|| 2 is a sum of independent standard Gaussian random variables. 
Applying Lemma 1 of [24] then gives that for every t > 0, 

Pr(2M||^|| 2 > V8Mt + 2t + 2M) < c~ f . 

Thus, taking i = 4^ gives 

Pr(||<^|| > 2) = Pr(*2M||^|| 2 > 8M) < Pr(W||<^|| 2 > 5A/) < C - A//2 . (5.17) 
Substituting ([5716} and ([5716]) into ([5715]) then gives 

Pr( W £ G s (vj) < 12dJ§ + e~ M / 2 . (5.18) 



Now, to bound (|5.14j) , we will apply Hoeffding's inequality [23] , which says that the tail probability of a sum 
of independent Bernoulli random variables X,-, each with success probability p, has the following bound: 



Pr ( ^2 Xi > n(p + t)j < c 



-2nt 



Also, note that replacing p in the left-hand side above with some p' > p will not increase the probability. As 
such, taking p' to be the right-hand side of (|5.18[) and t = 1 — a — p', we have 



Pr(£ v ) = Pr(^2 1 {^Gs(v)} > (l-a)nj < exp ( - 2n((l - a) - (l25y 
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'M_ , -M/2 
2ir TL 



Now that we have bounds on both \Afs\ and Pr(£„), we note that we specifically wish to show that 

Pr(pU(#;a) < < c qC - CiM 

for some constants cq, c\ > 0. Considering the above analysis and taking logarithms, it suffices to have 

log \Af s \ + logPr(£„) < 2Mlog(| + 1) - 2n(l - a) + 2n(l25^§ + c~ M / 2 ) < logc - Cl M 

with S 2 = jf. Since a < 1 — ^ by assumption, picking C < ^(1 — ~ a ) 2 wm make 2n(l — a) the 
dominant term in the above inequality, thereby proving the result. □ 

5.4. Removing large vertices. In this section, we prove how well we can remove the vertices with 
the largest noisy intensity measurements. We start with a lemma: 

Lemma 5.10. Pick C > and n > c c ^ s M logM. Draw the entries of an M x n matrix $ = {<^i}™ =1 
independently from C/V(0, -jj). Taking fj := 3e _c / 8 , then with overwhelming probability, 

#li:\(x^ l )\ 2 >^\<[3n (5.19) 



M , 

for every unit norm x E C M . 

Proof. Take a S-net Afs of the unit sphere in C M . Then for every unit vector x, there is a v x £ Afg such that 

\{x, tpi}\ < \{v x , + \{x - v x , tfi}\ < \(v x ,tpi)\ + \\x - v x \\ < \{v x , tpi)\ + 5. 

Taking 8 ~ we then have that \(x,ipi)\ 2 > jj implies \(v x ,ipi)\ 2 > jfj. Recall that we wish to 

bound the probability of the event £ that (|5.19l) is violated for some unit vector x. To this end, the above 
implication allows us to focus on a finite set of points: 



Pr(£) < Pr( 3v e N s s.t. £ l { ^ i)?> a_ } > [3n 



i=l 

As discussed in the proof of Lemma 15.91 we may take \Af s \ < (f + 1) 2M , and so the union bound and the 
symmetric distribution of <£> both give 

/ / N.2M / ™ \ 

Pr(£)<(V^ + l) Pr EidMI^)^™' (5-20) 

v i=l 7 

Similar to the proof of Lemma T5.91 we will bound the probability on the right-hand side using Hocffding's 
inequality. First, we note that the symmetric distribution of <fi = EiLi( a ™ + ib m )e m gives that 

Pr(|<^>| 2 > ^) = Pr(|<e 1)¥ , i >| 2 > £j) =Pr(a 2 + 6 2 > ^) < Pr(a 2 > + Pr(& 2 > ^) , 

where the last step is by the union bound. We continue, using the fact that a\ and b± are both distributed 
as Af&m)- 

Pr(|( V ,^)| 2 > ^) < 2Pr(a 2 > ^) = 2Pr(z 2 > £ ) < 2e" c / 8 , 
where z is a standard Gaussian random variable. With this, we now apply Hocffding's inequality to (|5.20[) : 

Pr(£) < {±\J% + l) 2M cxp(^ - 2n(fi - 2c" c/8 )) = exp^2M log {^\[^ + l) - 2ne~ c/8 ^ . 

17 



Since n > c c / 8 MlogM, then 2ne c ^ 8 is the dominant term above, thereby proving the result. □ 

Theorem 5.11. Pick C > and n > e c / 8 MlogM . Draw i/ie entries of an M x n matrix $ = {<p;}"=i 
independently from CA/"(0, tj). Given a vector x <E C M , consider 

Zi := \ (x, (fii)\ 2 + fi, 

where v = {^i}™ =1 satisfies j^z < ^737' Taking (3 :— 3e~ c '/ 8 ; we have 

#(i : Zi > ^N| 2 j < 2/3n (5.21) 

wif/i overwhelming probability. 

Proof. In counting large Zj's, we identify which come from large or small inner products \(x,tpi)\ 2 . First, 

{i : Zi > 2C||x|| 2 } n {i : |(x,<^)| 2 > % |M| 2 } 
is of size < /3n with overwhelming probability by Lemma 15.101 The rest of the large Zj's have indices in 

K:={i:zi>% \\x\\ 3 }n{i :\(x, <Pi)\ 2 < £\\x\\ 2 }. 
To count these, note that 14 = Z{ — \(x,ipi)\ 2 > jj\\x\\ 2 , and so 

|/c|^||x|| 4 <^h| 2 <!|HI 2 <^N| 4 . 

Rearranging then reveals that |/C| = O(M), meaning K. has fewer than j3n > 3MlogM members when M is 
sufficiently large. □ 

5.5. Main result. This section proves the main result of the paper, which we restate here: 

Theorem 5.12. Pick N ~ CMlogM with C sufficiently large, and take {^pt\f =l = $v U <&e defined in the 
measurement design of Section [3J Then there exist constants C',K > such that the following guarantee 
holds for all x £ C M with overwhelming probability: Consider measurements of the form 

Zi := \(x, ipt>)\ 2 + v t . 

If the noise-to- signal ratio satisfies NSR := ■Jjjjr < ~JJj' ^ len ^ e phase retrieval procedure of Section 
produces an estimate x from {z{\^ =1 with squared relative error 

\\x\\ 2 - yiogM 

for some phase 9 € [0,2-71"). 

Proof. We will prove the result by considering the steps of our phase retrieval process in reverse order. In 
the last step, we have the following estimates of (x, ipi) for every vertex i <G V" C V which survives our 
graph-pruning and large- vertex-removing processes: 

Vi := e ie (x, (p^ + S t = (e w $* v „x + 5),. 

Here, 8 is a global phase which is calculated in the proof of Thcorcm l5.3l From these estimates, we reconstruct 
by finding the least-squares estimate of x: 

x := {® V n$* v „)- l § v »y = e w x + {^y^yY^v^. 
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As such, we have the following bound on the reconstruction error: 

||i-e ie x|| = IKSv"*^)- 1 *^!! < W^v-Vv-r^v-hn = 11 (i v (5-22) 

where the last equality holds with probability 1, specifically, in the event that cr m i n ($y„) > 0. 

To continue this bound, recall that the large- vertex- removing process ensured \V"\ = nn. We claim there 
exists a constant c > such that with overwhelming probability, cr m i n ($y„) > c ^J\ogM for every V" C V 
with \ V"\ = nn. To prove this, we leverage the complex version of Corollary 5.35 in [35], which gives that 
for a fixed V" C V, there exist c, d > such that 



Pr(<w($v») < ^(v 7 ^- y/M- Vt^)j < ce- c ' tn . 
Performing a union bound over the QjJ = (n^w) < ( jzr^)^ 1 '^ 71 choices for V" then gives 
PrfaV'CV, |y"| = «ns.t. (7 min ($^) < ^?(V^*- VM-y/in)\ < cexp(c'n( - | + (1 - ft) log^)) 



Provided ^ > (1 — «;)log(Yz — ), which occurs whenever k > 0.86, then taking i := § + (1 — K)log(j3— ) 
will ensure both t < n and | > (1 — /t)log(y3-). Therefore, this choice for t coupled with the fact that 
n = CMXogM proves the claim. Continuing (|5.22p then gives 

\\i-e ie x\\< P (5.23) 

Next, we wish to bound ||5||. By definition, we have 

Si = Vi - c ie (x,^) = yi-e iar sK) - j e {x, Vi ). 

Note that the above square root operates under the assumption that Zj_ > for each i € V" , which is ensured 
when we prune for reliability. Denote & := y/zl — \ (x, <Pi)\. Then by the triangle inequality, we have 

\Si\ = 7i7e iarg(Ul) - ^%j( e +>*z((*>V*))) + ^ e i(e+arg«x^ 1 ))) < e iaxg(«0 _ ^(0 +axs ((x, Vi ))) + |^|_ 

Next, factoring e K0+<*g((x,<pi))) and applying flO]) yields 

|<*»| < % /iI||arg(M J ) - arg((a;,v? J )) - 6\\ r + |&|. 

For any a, 6 > 0, then since < (a - b) 2 = a 2 - 2ab + b 2 , we have (a + b) 2 = a 2 + 2ab + b 2 < 2(a 2 + b 2 ). 
Applying this inequality to the right-hand side above then gives 

l^l 2 < 2z l ||arg(u l ) - arg((z, ~ 0||* + 2&. 

Similarly, for any a,b > 0, then since ab > min{a 2 , b 2 }, we have (a — b) 2 = a 2 — 2ab + b 2 < \a 2 — b 2 \. Applying 
this to £ 2 = (y/zi — \(x,(pi)\) 2 then gives £f < Also by Theorem l5.11l our large- vertex-removing process 
ensures that z, < '^-\\x\\ 2 for every i € V" . Combined, these facts imply 

\\S\\ 2 < ^\\x\\ 2 J2 INK) - arg((x, 6\\ 2 T + 2||iv||i. (5.24) 

Applying Theorem 15.31 Definition 15.71 and Theorem 15.81 further gives 

^J|argK) arg((x,^)) »|| T < ^pr ^ Ai||x||*PUN(* v ; ~ dplRP { ' 
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Recalling the definition of e, we have 

k=0 ' k=0 

and so, letting E' denote the edges which remained after pruning for connectivity, the Cauchy-Schwarz 
inequality gives 

w 2 = E m 2 = E E fEK fc i 2 )fEi^i 2 )<^!iHi 2 . (5.26) 



{i,j}eE> {i,j}£E' 



3 

fe=0 



9 

{i,j}€E' x fc=0 ' x fe=0 



Finally, we combine (jQ3|) . (pT23]l . (|Qj|l and (f5T26|) . along with < V^IMI < V^CMlog M| 



\x-c w x\\ 2 < 4c lC2 71/ |H| 2 | 2\Z2C / H <g / A/ HI 



3c 2 c 2 A 2 logM ||x|| 4 c 2 yiogM||x|| 2 - Y logM ||a;|| 2 ' 
for some constant K. □ 
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